{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b1acc12a-712e-41e6-8e30-41f9de223543",
   "metadata": {},
   "source": [
    "# Multimodal RAG with LlamaIndex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c557723-257f-4746-9b84-ec77c50cf405",
   "metadata": {},
   "source": [
    "This cookbook shows how to perform RAG on the table and text extraction output of nv-ingest's pdf extraction tools using LlamaIndex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "baecfda5-137b-43da-a8d4-23dd47131be9",
   "metadata": {},
   "source": [
    "To start we'll need to make sure we have llama_index installed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d75fdaa-3085-4b77-9289-15276d5cd1c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "pip install llama_index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45412661-9516-47f9-8bea-6f857e0e173f",
   "metadata": {},
   "source": [
    "Then, we'll use nv-ingest to parse an example pdf that contains text, tables, charts, and images. We'll need to make sure to have the nv-ingest microservice up and running at localhost:7670 along with the supporting NIMs. To do this, follow the nv-ingest [quickstart guide](https://github.com/NVIDIA/nv-ingest?tab=readme-ov-file#quickstart). Once the microservice is ready we can create a job with the nv-ingest python client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9337daf9-7342-427a-a95e-2d9ac9b5076b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from nv_ingest_client.client import NvIngestClient\n",
    "from nv_ingest_client.primitives import JobSpec\n",
    "from nv_ingest_client.primitives.tasks import ExtractTask\n",
    "from nv_ingest_client.primitives.tasks import SplitTask\n",
    "from nv_ingest_client.util.file_processing.extract import extract_file_content\n",
    "import logging, time\n",
    "\n",
    "logger = logging.getLogger(\"nv_ingest_client\")\n",
    "\n",
    "file_name = \"../data/multimodal_test.pdf\"\n",
    "file_content, file_type = extract_file_content(file_name)\n",
    "\n",
    "job_spec = JobSpec(\n",
    "    document_type=file_type,\n",
    "    payload=file_content,\n",
    "    source_id=file_name,\n",
    "    source_name=file_name,\n",
    "    extended_options={\"tracing_options\": {\"trace\": True, \"ts_send\": time.time_ns()}},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2aa9a74c-f7e4-475d-970d-cf820cd8ea19",
   "metadata": {},
   "source": [
    "And then we can and submit a task to extract the text and tables from the example pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6f07d820-0ae6-4681-88da-3f8857ea71fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "extract_task = ExtractTask(\n",
    "    document_type=file_type,\n",
    "    extract_text=True,\n",
    "    extract_images=False,\n",
    "    extract_tables=True,\n",
    ")\n",
    "\n",
    "\n",
    "job_spec.add_task(extract_task)\n",
    "client = NvIngestClient()\n",
    "job_id = client.add_job(job_spec)\n",
    "\n",
    "client.submit_job(job_id, \"morpheus_task_queue\")\n",
    "\n",
    "result = client.fetch_job_result(job_id, timeout=60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "07d41434-a5cb-4cc3-aaad-a57603abda44",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'document_type': 'text',\n",
       " 'metadata': {'content': 'TestingDocument\\r\\nA sample document with headings and placeholder text\\r\\nIntroduction\\r\\nThis is a placeholder document that can be used for any purpose. It contains some \\r\\nheadings and some placeholder text to fill the space. The text is not important and contains \\r\\nno real value, but it is useful for testing. Below, we will have some simple tables and charts \\r\\nthat we can use to confirm Ingest is working as expected.\\r\\nTable 1\\r\\nThis table describes some animals, and some activities they might be doing in specific \\r\\nlocations.\\r\\nAnimal Activity Place\\r\\nGira@e Driving a car At the beach\\r\\nLion Putting on sunscreen At the park\\r\\nCat Jumping onto a laptop In a home o@ice\\r\\nDog Chasing a squirrel In the front yard\\r\\nChart 1\\r\\nThis chart shows some gadgets, and some very fictitious costs. Section One\\r\\nThis is the first section of the document. It has some more placeholder text to show how \\r\\nthe document looks like. The text is not meant to be meaningful or informative, but rather to \\r\\ndemonstrate the layout and formatting of the document.\\r\\n• This is the first bullet point\\r\\n• This is the second bullet point\\r\\n• This is the third bullet point\\r\\nSection Two\\r\\nThis is the second section of the document. It is more of the same as we’ve seen in the rest \\r\\nof the document. The content is meaningless, but the intent is to create a very simple \\r\\nsmoke test to ensure extraction is working as intended. This will be used in CI as time goes \\r\\non to ensure that changes we make to the library do not negatively impact our accuracy.\\r\\nTable 2\\r\\nThis table shows some popular colors that cars might come in.\\r\\nCar Color1 Color2 Color3\\r\\nCoupe White Silver Flat Gray\\r\\nSedan White Metallic Gray Matte Gray\\r\\nMinivan Gray Beige Black\\r\\nTruck Dark Gray Titanium Gray Charcoal\\r\\nConvertible Light Gray Graphite Slate Gray\\r\\nPicture\\r\\nBelow, is a high-quality picture of some shapes. Chart 2\\r\\nThis chart shows some average frequency ranges for speaker drivers.\\r\\nConclusion\\r\\nThis is the conclusion of the document. It has some more placeholder text, but the most \\r\\nimportant thing is that this is the conclusion. As we end this document, we should have \\r\\nbeen able to extract 2 tables, 2 charts, and some text including 3 bullet points.',\n",
       "  'content_metadata': {'description': 'Unstructured text from PDF document.',\n",
       "   'hierarchy': {'block': -1,\n",
       "    'line': -1,\n",
       "    'nearby_objects': {'images': {'bbox': [], 'content': []},\n",
       "     'structured': {'bbox': [], 'content': []},\n",
       "     'text': {'bbox': [], 'content': []}},\n",
       "    'page': -1,\n",
       "    'page_count': 3,\n",
       "    'span': -1},\n",
       "   'page_number': -1,\n",
       "   'subtype': '',\n",
       "   'type': 'text'},\n",
       "  'debug_metadata': None,\n",
       "  'embedding': None,\n",
       "  'error_metadata': None,\n",
       "  'image_metadata': None,\n",
       "  'info_message_metadata': None,\n",
       "  'raise_on_failure': False,\n",
       "  'source_metadata': {'access_level': 1,\n",
       "   'collection_id': '',\n",
       "   'date_created': '2024-09-04T17:17:41.657851',\n",
       "   'last_modified': '2024-09-04T17:17:41.657538',\n",
       "   'partition_id': -1,\n",
       "   'source_id': '../data/multimodal_test.pdf',\n",
       "   'source_location': '',\n",
       "   'source_name': '../data/multimodal_test.pdf',\n",
       "   'source_type': 'PDF',\n",
       "   'summary': ''},\n",
       "  'table_metadata': None,\n",
       "  'text_metadata': {'keywords': '',\n",
       "   'language': 'en',\n",
       "   'summary': '',\n",
       "   'text_location': [-1, -1, -1, -1],\n",
       "   'text_type': 'document'}}}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result[0][0][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a2a0a9c-ede6-4cef-b182-9f0b78223647",
   "metadata": {},
   "source": [
    "Now, we have the extraction results in the nv-ingest metadata format. We'll separate the content out of this and load it into LlamaIndex documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c2b0d5eb-f0db-4edb-a3fe-c3bfccc59227",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "\n",
    "texts = []\n",
    "tables = []\n",
    "for element in result[0][0]:\n",
    "    if element['document_type'] == 'text':\n",
    "        texts.append(Document(text=element['metadata']['content']))\n",
    "    elif element['document_type'] == 'structured':\n",
    "        tables.append(Document(text=element['metadata']['table_metadata']['table_content']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0f220a41-fc55-4b6c-95e1-28b41bfdba0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(id_='9bad2140-a997-4af0-a4ea-c8236416def0', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='TestingDocument\\r\\nA sample document with headings and placeholder text\\r\\nIntroduction\\r\\nThis is a placeholder document that can be used for any purpose. It contains some \\r\\nheadings and some placeholder text to fill the space. The text is not important and contains \\r\\nno real value, but it is useful for testing. Below, we will have some simple tables and charts \\r\\nthat we can use to confirm Ingest is working as expected.\\r\\nTable 1\\r\\nThis table describes some animals, and some activities they might be doing in specific \\r\\nlocations.\\r\\nAnimal Activity Place\\r\\nGira@e Driving a car At the beach\\r\\nLion Putting on sunscreen At the park\\r\\nCat Jumping onto a laptop In a home o@ice\\r\\nDog Chasing a squirrel In the front yard\\r\\nChart 1\\r\\nThis chart shows some gadgets, and some very fictitious costs. Section One\\r\\nThis is the first section of the document. It has some more placeholder text to show how \\r\\nthe document looks like. The text is not meant to be meaningful or informative, but rather to \\r\\ndemonstrate the layout and formatting of the document.\\r\\n• This is the first bullet point\\r\\n• This is the second bullet point\\r\\n• This is the third bullet point\\r\\nSection Two\\r\\nThis is the second section of the document. It is more of the same as we’ve seen in the rest \\r\\nof the document. The content is meaningless, but the intent is to create a very simple \\r\\nsmoke test to ensure extraction is working as intended. This will be used in CI as time goes \\r\\non to ensure that changes we make to the library do not negatively impact our accuracy.\\r\\nTable 2\\r\\nThis table shows some popular colors that cars might come in.\\r\\nCar Color1 Color2 Color3\\r\\nCoupe White Silver Flat Gray\\r\\nSedan White Metallic Gray Matte Gray\\r\\nMinivan Gray Beige Black\\r\\nTruck Dark Gray Titanium Gray Charcoal\\r\\nConvertible Light Gray Graphite Slate Gray\\r\\nPicture\\r\\nBelow, is a high-quality picture of some shapes. Chart 2\\r\\nThis chart shows some average frequency ranges for speaker drivers.\\r\\nConclusion\\r\\nThis is the conclusion of the document. It has some more placeholder text, but the most \\r\\nimportant thing is that this is the conclusion. As we end this document, we should have \\r\\nbeen able to extract 2 tables, 2 charts, and some text including 3 bullet points.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "texts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "62478b8f-aba2-4233-918a-d3025e87c13e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(id_='542181ac-d3f2-490a-979b-cf0c6e24c1d4', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='locations. Animal Activity Place Giraffe Driving a car At the beach Lion Putting on sunscreen At the park Cat Jumping onto a laptop In a home office Dog Chasing a squirrel In the front yard', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n",
       " Document(id_='563e88da-dd0f-4121-98f9-c3d85f7a7bea', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='This chart shows some gadgets, and some very fictitious costs. >\\\\n7938.758 ext. Print & Maroon Bookshelf Fine Art Poems Collection dla Cemicon Diamtháhn | Gadgets and their cost\\nSollywood for Coasters | 19875.075     t158.281 \\n Hammer | 19871.55 \\n Powerdrill | 12044.625 \\n Bluetooth speaker | 7598.07 \\n Minifridge | 9916.305 \\n Premium desk', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n",
       " Document(id_='ab3c36fe-6812-41b3-9cf3-966f363e8e5f', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='This table shows some popular colors that cars might come in. Car Color1 Color2 Color3 Coupe White Silver Flat Gray Sedan White Metallic Gray Matte Gray Minivan Gray Beige Black Truck Dark Gray Titanium Gray Charcoal Convertible Light Gray Graphite Slate Gray', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'),\n",
       " Document(id_='440297f1-6e6f-4e5a-8af8-d56d7afa7b1c', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='This chart shows some average frequency ranges for speaker drivers TITLE | Chart 2 \\n Frequency Range Start (Hz) | Frequency Range Start (Hz) | Frequency Range End (Hz) \\n Twitter | 12800 | 12700 \\n Midrange | 13900 | 13000 \\n Midwoofer | 9600 | 13000 \\n Subwoofer | 0.00 | 13000', mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eaf6a9b8-83cf-4061-bf57-d50edc3978d0",
   "metadata": {},
   "source": [
    "Now, the text and table content is ready to be embedded and stored. We'll set our OpenAI api key in order to use OpenAI's embedding model, but any desired embedding model can be used here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "a8d693a4-e647-4c20-bea4-56fe52c74541",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "# TODO: Add your OpenAI API key here\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPENAI_API_KEY>\"\n",
    "\n",
    "index = VectorStoreIndex.from_documents(texts+tables)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7e23e9d-a3ad-4b80-a356-1b233633b82d",
   "metadata": {},
   "source": [
    "Next, we'll use our vectorstore to create a query engine that handles the RAG pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c3eb210e-1106-4956-80a3-f950d30ac6c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2664333b-d30a-4846-801d-e484a18efe45",
   "metadata": {},
   "source": [
    "And finally, we can ask it questions about our example PDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f397a34b-4639-4f8a-81a8-5a5a416404d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The dog is chasing a squirrel in the front yard.'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_engine.query(\"What is the dog doing and where?\").response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee5de708-c5e3-43fd-8447-59fb97153eee",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
